Automated Selection of Rule Induction Methods Based on Recursive Iteration of Resampling Methods and Multiple Statistical Testing

نویسندگان

  • Shusaku Tsumoto
  • Hiroshi Tanaka
چکیده

One of the most important problems in rule induction methods is how to estimate which method is the best to use in an applied domain. While some methods are useful in some domains, they aTe not useful in other domains. Therefore it is very dificult to choose one of these methods. FOT this purpose, we introduce multiple testing based on recursive iteration of resampling methods for rule-induction (MULT-RECITE-R). This method consists of four procedures, which includes the inner loop and the outer loop procedures. First, orkginal training samples($) are randomly split into new training samples(&) and teat samples(T1) using a Tesampiing scheme. second, & are again spiii inio training sample(&) and training samples(li) using the same resampling scheme. Rule induction methods ave applied and predefined metrics aTe calculated. This second procedure, as the inner loop, is repeated for 10000 times. Then, third, rule induction methods are applied to 5’1, and the met&s calculated by Tl are cornpaved with those by Tz. If the metrics derived by TZ predicts those by Tl, then we count it as a success. The second and third procedures, as the outeT loop, are iterated foT 10000 times. Finally, fourth, the overall results are interpreted, and the best method is selected if the resampling scheme performs well. In OTdeT to evaluate this system, we apply this MULT-RECITER method to three UCI databases. The results show that this method gives the best selection of estimation methods statistically.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Empirical Selection of Rule Induction Methods Based on Recursive Iteration of Resampling Methods

One of the most important problems in rule induction methods is how to estimate which method is the best to use in an applied domain. While some methods are useful in some domains, they are not useful in other domains. Therefore it is very difficult to choose one of these methods. For this purpose, we introduce multiple testing based on recursive iteration of resampling methods for rule-inducti...

متن کامل

Selection of Probabilistic Measure Estimation Method Based on Recursive Iteration of Resampling Methods

One of the most important problems in rule induction methods is how to estimate the reliability of the induced rules, Which is a semantic pm~t .of knowledge to be estimated from finite training samples. In order to estimate errors of induced results, resampling methods, such as cross-vaiidation, the bootstrap method, have been introduced. However, While cross-validation method obtains better re...

متن کامل

Credit Card Fraud Detection using Data mining and Statistical Methods

Due to today’s advancement in technology and businesses, fraud detection has become a critical component of financial transactions. Considering vast amounts of data in large datasets, it becomes more difficult to detect fraud transactions manually. In this research, we propose a combined method using both data mining and statistical tasks, utilizing feature selection, resampling and cost-...

متن کامل

An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods

Classification methods from statistical pattern recognition, neural nets, and machine learning were applied to four real-world data sets. Each of these data sets has been previously analyzed and reported in the statistical, medical, or machine learning literature. The data sets are characterized by statisucal uncertainty; there is no completely accurate solution to these problems. Training and ...

متن کامل

The Importance of Knowing When to Stop

Objectives: Component-wise boosting algorithms have evolved into a popular estimation scheme in biomedical regression settings. The iteration number of these algorithms is the most important tuning parameter to optimize their performance. To date, no fully automated strategy for determining the optimal stopping iteration of boosting algorithms has been proposed. Methods: We propose a fully data...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995